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Abstract A novel trust region method for solving linearly constrained nonlinear programs is pre¬ 
sented. The proposed technique is amenable to a distributed implementation, as its salient ingredient 
is an alternating projected gradient sweep in place of the Cauchy point computation. It is proven that 
the algorithm yields a sequence that globally converges to a critical point. As a result of some changes 
to the standard trust region method, namely a proximal regularisation of the trust region subproblem, 
it is shown that the local convergence rate is linear with an arbitrarily small ratio. Thus, convergence 
is locally almost superlinear, under standard regularity assumptions. The proposed method is suc¬ 
cessfully applied to compute local solutions to alternating current optimal power flow problems in 
transmission and distribution networks. Moreover, the new mechanism for computing a Cauchy point 
compares favourably against the standard projected search as for its activity detection properties. 
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1 Introduction 

Minimising a separable smooth nonconvex function subject to partially separable coupling equality 
constraints and separable constraints, appears in many engineering problems such as Distributed Non¬ 
linear Model Predictive Control (DNMPC) (Necoara, I. and Savorgnan, C. and Tran Dinh, Q. and 
Suykens, J. and Diehl, M., 2009), power systems (Kim, B.H. and Baldick, R., 1997) and wireless 
networking (Chiang et al, 2007). Eor such problems involving a large number of agents, which result 
in large-scale nonconvex Nonlinear Programs (NLP), it may be desirable to perform computations 
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in a distributed manner, meaning that all operations are not carried out on one single node, but on 
multiple nodes spread over a network and that information is exchanged during the optimisation pro¬ 
cess. Such a strategy may prove useful to reduce the computational burden in the case of extremely 
large-scale problems. Moreover, autonomy of the agents may be hampered by a purely centralised algo¬ 
rithm. Case in points are cooperative tracking using DNMPC (Hours, J.-H. and Jones, C.N., 2016) or 
the Optimal Power Flow problem (OPF) over a distribution network (Gan, L. and Li, N. and Topcu, 
U. and Low, S.H., 2014), into which generating entities may be plugged or unplugged. Moreover, it 
has been shown in a number of studies that distributing and parallelising computations can lead to 
significant speed-up in solving large-scale NLPs (Zavala, V.M. and Laird, C.D. and Biegler, L.T., 
2008). Splitting operations can be done on distributed memory parallel environments such as clusters 
(Zavala, V.M. and Laird, C.D. and Biegler, L.T., 2008), or on parallel computing architectures such 
as Graphical Processing Units (GPU) (Fei, Y. and Guodong, R. and Wang, B. and Wang, W., 2014). 

Our objective is to develop nonlinear programming methods in which most of the computations 
can be distributed or parallelised. Some of the key features of a distributed optimisation strategy are 
the following: 

(i) Shared memory. Vectors and matrices involved in the optimisation process are stored on different 
nodes. This requirement rules out direct linear algebra methods, which require the assembly of 
matrices on a central unit. 

(ii) Concurrency. A high level of parallelism is obtained at every iteration. 

(iii) Cheap exchange. Global communications of agents with a central node are cheap (scalars). More 
costly communications (vectors) remain local between neighbouring agents. In general, the amount 
of communication should be kept as low as possible. It is already clear that globalisation strategies 
based on line-search do not fit with the distributed framework (Fei, Y. and Guodong, R. and 
Wang, B. and Wang, W., 2014), as these entail evaluating a ‘central’ merit function multiple times 
per iteration, thus significantly increasing communications. 

(iv) Inexactness. Gonvergence is ‘robust’ to inexact solutions of the subproblems, since it may be 
necessary to truncate the number of sub-iterations due to communication costs. 

(v) Fast convergence. The sequence of iterates converges at a fast (at least linear) local rate. Slow 
convergence generally results in a prohibitively high number of communications. 

As we are interested in applications such as DNMPG, which require solving distributed parametric 
NLPs with a low latency (Hours, J.-H. and Jones, G.N., 2016), a desirable feature of our algorithm 
should also be 

(vi) Warm-start and activity detection. The algorithm detects the optimal active-set quickly and enables 
warm-starting. 

Whereas a fair number of well-established algorithms exist for solving distributed convex NLPs (Bert- 
sekas, D.P. and Tsitsiklis, J.N., 1997), there is, as yet, no consensus around a set of practical methods 
applicable to distributed nonconvex programs. Some work (Zavala, V.M. and Laird, G.D. and Biegler, 
L.T., 2008) exists on the parallelisation of linear algebra operations involved in solving nonconvex 
NLPs with IPOPT (Wachter, A. and Biegler, L.T., 2006), but the approach is limited to very specific 
problem structures and the globalisation phase of ipopt (filter line-search) is not suitable for fully dis¬ 
tributed implementations (requirements (iii), (iv) and (vi) are not met). Among existing strategies ca¬ 
pable of addressing a broader class of distributed nonconvex programs, one can make a clear distinction 
between Sequential Gonvex Programming (SGP) approaches and augmented Lagrangian techniques. 

An SGP method consists in iteratively solving distributed convex NLPs, which are local approx¬ 
imations of the original nonconvex NLP. To date, some of the most efficient algorithms for solving 
distributed convex NLPs combine dual decomposition with smoothing techniques (Necoara, 1. and 
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Savorgnan, C. and Tran Dinh, Q. and Suykens, J. and Diehl, M., 2009; Tran-Dinh, Q. and Savorgnan, 
C. and Diehl, M., 2013). On the contrary, an augmented Lagrangian method aims at decomposing a 
nonconvex auxiliary problem inside an augmented Lagrangian loop (Cohen, G., 1980; Hamdi, A. and 
Mishra, S.K., 2011; Hours, J.-H. and Jones, C.N., 2014). While convergence guarantees can be derived 
in both frameworks, computational drawbacks also exist on both sides. For instance, it is not clear 
how to preserve the convergence properties of SCP schemes when every subproblem is solved to a low 
level of accuracy. Hence, (iv) is not satisfied immediately. Nevertheless, for some recent work in this 
direction, one may refer to (Tran Dinh, Q. and Necoara, I. and Diehl, M., 2013). The convergence rate 
of the algorithm analysed in (Tran Dinh, Q. and Necoara, I. and Diehl, M., 2013) is at best linear, thus 
not fulfilling (v). On the contrary, the inexactness issue can be easily handled inside an augmented 
Lagrangian algorithm, as global and fast local convergence is guaranteed even though the subproblems 
are not solved to a high level of accuracy (Fernandez, D. and Solodov, M.V., 2012; Conn, A. and Gould, 
N.I.M. and Toint, P.L., 1991). However, in practice, poor initial estimates of the dual variables can 
drive the iterative process to infeasible points. Moreover, it is still not clear how the primal noncon¬ 
vex subproblems should be decomposed and solved efficiently in a distributed context. The quadratic 
penalty term of an augmented Lagrangian does not allow for the same level of parallelism as a (convex) 
dual decomposition. Thus, requirement (ii) is not completely satisfied. To address this issue, we have 
recently proposed applying Proximal Alternating Linearised Minimisations (PALM) (Bolte, J. and 
Sabach, S. and Teboulle, M., 2014) to solve the auxiliary augmented Lagrangian subproblems (Hours, 
J.-H. and Jones, C.N., 2014,6). The resulting algorithm inherits the slow convergence properties of 
proximal gradient methods and does not readily allow one to apply a preconditioner. In this paper, a 
novel mechanism for handling the augmented Lagrangian subproblems in a more efficient manner is 
proposed and analysed. The central idea is to use alternating gradient projections to compute a Cauchy 
point in a trust region Newton method (Conn, A.R. and Gould, N.I.M. and Toint, P.L., 2000). 

When looking at practical trust region methods for solving bound-constrained problems (Zavala, 
V.M. and Anitescu, M., 2014), one may notice that the safeguarded Conjugate Gradient (sCG) algo¬ 
rithm is well-suited to distributed implementations, as the main computational tasks are structured 
matrix-vector and vector-vector multiplications, which do not require the assembly of a matrix on a 
central node. Moreover, the global communications involved in an sCG algorithm are cheap. Thus, sCG 
satisfies (i), (ii) and (iii). The implementation of CG on distributed architectures has been extensively 
explored (Verschoor, M. and Jalba, A.C., 2012; D’Azevedo, E. and Eijkhout, V. and Romine, C., 1993; 
Eei, Y. and Guodong, R. and Wang, B. and Wang, W., 2014). Eurthermore, a trust region update 
requires only one centralized objective evaluation per iteration. Erom a computational perspective, 
it is thus comparable to a dual update, which requires evaluating the constraints functional and is 
ubiquitous in distributed optimisation algorithms. However, computing the Cauchy point in a trust 
region loop is generally done by means of a projected line-search (Zavala, V.M. and Anitescu, M., 
2014) or sequential search (Conn, A.R. and Gould, N.I.M and Toint, P.L., 1988). Whereas it is broadly 
admitted that the Cauchy point computation is cheap, this operation requires a significant amount 
of global communications in distributed memory parallel environments, and is thus hardly amenable 
to such applications (Eei, Y. and Guodong, R. and Wang, B. and Wang, W., 2014). This hampers 
the implementability of trust region methods with good convergence guarantees on distributed com¬ 
puting platforms, whereas many parts of the algorithm are attractive for such implementations. The 
aim of this paper is to bridge the gap by proposing a novel way of computing the Cauchy point that 
is more tailored to the distributed framework. Coordinate gradient descent methods such as PALM, 
are known to be parallelisable for some partial separability structures (Bertsekas, D.P. and Tsitsiklis, 
J.N., 1997). Moreover, in practice, the number of backtracking iterations necessary to select a block 
step-size, can be easily bounded, making the approach suitable for ‘Same Instruction Multiple Data’ 
architectures. Therefore, we propose using one sweep of block-coordinate gradient descent to compute 
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a Cauchy point. As shown in paragraph 4.2 of Section 4, such a strategy turns out to be efficient at iden¬ 
tifying the optimal active-set. It can then be accelerated by means of an inexact Newton method. As 
our algorithm differs from the usual trust region Newton method, we provide a detailed convergence 
analysis in Section 4. Finally, one should mention a recent paper (Xue, D. and Sun, W. and Qi, L., 
2014), in which a trust region method is combined with alternating minimisations, namely the Alter¬ 
nating Directions Method of Multipliers (ADMM) (Bertsekas, D.P. and Tsitsiklis, J.N., 1997), but in a 
very different way from the strategy described next. The contributions of the paper are the following: 

— We propose a novel way of computing a Cauchy point in a trust region framework, which is suitable 
for distributed implementations. 

— We adapt the standard trust region algorithm to the proposed Cauchy point computation. Global 
convergence along with an almost Q-superlinear local rate is proven under standard assumptions. 

— The proposed trust region algorithm, entitled trap (Trust Region with Alternating Projections), 
is used as a primal solver in an augmented Lagrangian dual loop, resulting in an algorithm that 
meets requirements (i)-(vi), and is applied to solve OPF programs in a distributed fashion. 

In Section 2, some basic notion in variational analysis is recalled. In Section 3, our trap algorithm is 
presented. Its convergence properties are analysed in Section 4. Global convergence to a critical point 
is guaranteed, as well as almost Q-superlinear local convergence. Finally, the proposed algorithm is 
tested on OPF problems over transmission and distribution networks in Section 5. 


2 Background 

Given a closed convex set I?, the projection operator onto Q is denoted by Pq and the indicator 
function of Q is defined by 

Jo, iixeQ , 

in {x) - ^ iix^n . 

We define the normal cone to 17 at a; G C as 

Mn {x) := e : Vy e 17, (v, y - a;) < o| . 

The tangent cone to 17 at a: is defined as the closure of feasible directions at x (Rockafellar, R.T. 
and Wets, R.J.-B., 2009). Both Mn (x) and 7n (x) are closed convex cones. As 17 is convex, for 
all a; G 17, jVa (x) and Ta (x) are polar to each other (Rockafellar, R.T. and Wets, R.J.-B., 2009). 

Theorem 21 (Moreau’s decomposition (Moreau, 1962)). Let 1C be a closed convex cone in R'’* and K.° 
its polar cone. For all x,y, z & R'’*, the following two statements are equivalent: 

1. z = x + y with X & 1C, y G /C° and {x, y) = 0. 

2. X = P)c (z) and y = Pjco (z). 

The box-shaped set {a; G R'^ : Vi G {1, ... ,d} ,li < Xi < Ui} is denoted by B (/, u). For a: G R'^ 
and r > 0, the open ball of radius r centered around x is denoted by B {x,r). Given a; G 17, the set 
of active constraints at x is denoted by An {x)- Given a set S C R'^, its relative interior is defined as 
the interior of S within its affine hull, and is denoted by ri (S'). 

A critical point x* of the function f + lq with / differentiable, is said to be non-degenerate if 

-V/ (a;*) G ri [jVn (a;*)) . 
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Given a differentiable function / of several variables a;i,..., aiAr, its gradient with respect to variable Xi 
is denoted by \7if. Given a matrix M G its (i,j) element is denoted by Mij. 

A sequence {a;*} converges to x* at a Q-linear rate g G ]0,1[ if, for I large enough, 

\\x^+^-x*\\^ 

\W -X*\\2 ~ ^ ' 

The convergence rate is said to be Q-superlinear if the above ratio tends to zero as I goes to infinity. 


3 A Trust Region Algorithm with Distributed Activity Detection 

3.1 Algorithm Formulation 

The problem we consider is that of minimising a partially separable objective function subject to 
separable convex constraints. 


minimise L {wi,... ,Wm) (1) 

W 

s.t. Wi G Wi, Vi G {1,..., A} , 

where w := (wi ,. . . , G R”, with n = and W := Wi X ... X Wjv, where the 

sets Wi C R”* are closed and convex. The following Assumption is standard in distributed com¬ 
putations (Bertsekas, D.P. and Tsitsiklis, J.N., 1997). 

Assumption 31 (Golouring scheme). The sub-variables wi,... ,wn can be re-ordered and grouped 
together in such a way that a Gauss-Seidel minimisation sweep on the function L can be performed in 
parallel within K N groups, which are updated sequentially. In the sequel, the re-ordered variable 
is denoted by x = (xf,... ,x\Y. The set W is transformed accordingly into 12 = x ... x Qk- It 
is worth noting that each set 12k with k G {1,..., K} can then be decomposed further into sets Wi 
with i G {1,..., N}. 

As a consequence of Assumption 31, NLP (1) is equivalent to 

minimise L {xi ,..., Xk) 

X 

s.t. Xfc G 12/c, Vfc G {1,..., A} . 

Remark 31. Such a partially separable structure in the objective (Assumption 31) 
very often in practice, for instance when relaxing network coupling constraints via an 
grangian penalty. Thus, by relaxing the nonlinear coupling constraint C (wi,... ,wn) = 
equality constraints gi (wi) = 0 of 


is encountered 
augmented La- 
0 and the local 


N 

minimised fi{wi) 

wi ,...,wn ‘ ^ 

2=1 

s.t. C (wi,..., Wat) = 0 
gi (wi) = 0 
Wi G Wi 
iG{l,...,A} 
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in a differentiable penalty function, one obtains an NLP of the form (1). In NLPs resulting from the 
direct transcription of optimal control problems, the objective is generally separable and the constraints 
are stage-wise with a coupling between the variables at a given time instant with the variables of the 
next time instant. In this particular case, the number of groups is K = 2. In Section 5, we illustrate 
this property by means of examples arising from various formulations of the Optimal Power Flow 
(OPF) problem. The number of colours K represents the level of parallelism that can be achieved in 
a Gauss-Seidel method for solving (1). Thus, in the case of a discretised OOP, an alternating projected 
gradient sweep can be applied in two steps during which all updates are parallel. 

For the sake of exposition, in order to make the distributed nature of our algorithm apparent, we as¬ 
sume that every sub-variable Wi, with iG{l,...,Ai},is associated with a computing node. Two nodes 
are called neighbours if they are coupled in the objective I. Our goal is to find a first-order critical 
point of NLP (1) via an iterative procedure for which we are given an initial feasible point x^ G 17. The 
iterative method described next aims at computing every iterate in a distributed fashion, which re¬ 
quires communications between neighbouring nodes and leads to a significant level of concurrency. 

Assumption 32. The objective function I is bounded below on {a; G 17 : L{x) < L{x^)}. 

The algorithm formulation can be done for any convex set 17, but some features are more suitable 
for linear inequality constraints. 

Assumption 33 (Polyhedral constraints). For all k G {1,... ,K}, the set flk is a non-empty poly¬ 
hedron, such that 

17fc ;= {a; G R"'' : {LJk,t,x) < hk,i, i e {1,... ,mk}} , 
with ujk.i G R"'”, hk,i G R for all i G {1,... ,mk} and nk,mk > 1. 

Assumption 34. The objective function I is continuously differentiable in an open set contain¬ 
ing 12. Its gradient VI is uniformly continuous. 

It is well-known (Conn, A.R. and Gould, N.I.M. and Toint, P.L., 2000) that for problem (1), x* 
being a critical point is equivalent to 

Pn{x*-VL{x*)) =x* . (2) 

Algorithm 1 below is designed to compute a critical point x* of the function I -\- lq. It is essentially 
a two-phase approach, in which an active-set is first computed and then, a quadratic model is min¬ 
imised approximately on the current active face. Standard two-phase methods compute the active-set 
by means of a centralised projected search, updating all variables centrally. More precisely, a model of 
the objective is minimised along the projected objective gradient, which yields the Cauchy point. The 
model decrease provided by the Cauchy point is then enhanced in a refinement stage. Similarly to 
a two-phase method, in order to globalise convergence. Algorithm 1 uses the standard trust region 
mechanism. At every iteration, a model m of the objective function I is constructed around the 
current iterate x as follows 

m(^x') := L{x)-\-(VL{x) ,x'— x)-\-^(^x'— x,B (x) [x — x)) , (3) 

where x' G R” and B (x) is a symmetric matrix. 

Assumption 35 (Uniform bound on model hessian). There exists B > 0 such that 

\\B{x)\\,<B , 


for all X & f2. 
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Algorithm 1 Trust Region Algorithm with Alternating Projections (trap) 

1 

Parameters: Initial trust region radius A, update parameters cri, cr 2 and <73 such that 0 < cri < cr 2 < 1 < < 73 , 
test ratios rji and 772 such that 0 < 771 < 772 < 1, coefficients 71 G ]0,1[ and 72 > 0, termination tolerance e. 

2 

Input: Initial guess x, projection operators objective function L, objective gradient VL. 

3 

while \\Pf 2 ~ VL (a:)) — x \\2 > e do 


4 

Distributed activity detection (alternating gradient projections): 


5 

for k = \ K do 


6 

^ k '^ (-^[1, fc —1]| » ^k : + ) ) ) 

> In parallel in group k 

7 

where is computed according to requirements (5), (6) and (7). 


8 

end for 


9 

Distributed refinement (Algorithm 2 ): 


10 

Find y £ Q such that 


11 

m (x) — m (y) > 71 (tti (a:) — m {z)) 


12 

\\y-x\\^ < 72 A 


13 

An,, (zfc) c An,, ivk ) for all fc e {1,..., K }. 


14 

Trust-region update: 


15 

p i — ^(.y)/ m { x )— m { y ) 


16 

if p < 771 then 

> Not successful 

17 

(Do not update x) 


18 

Take A within [crizi, (72/\] 


19 

else if p G [ 771 , 772 ] then 

> Successful 

20 

x-^y 


21 

Take A within [criA^a^A] 


22 

Update objective gradient VL (a:) and model hessian B (x). 


23 

else 

0 Very successful 

24 

X <- y 


25 

Take A within [A, as A] 


26 

Update objective gradient VL (a:) and model hessian B ( x ) 


27 

end if 


28 

end while 



The following Assumption is necessary to ensure distributed computations in Algorithm 1. It is 
specific to Algorithm 1 and does not appear in the standard trust region methods (Burke et al, 1990). 

Assumption 36 (Structured model hessian). For all x £ f2, for all i,j G {1,... ,n}, 

(a;) = 0 Bij (a;) = 0 . 


Remark 32. It is worth noting that the partial separability structure of the objective function L is 
transferred to the sparsity pattern of its hessian hence, by Assumption 36, to the sparsity pattern 

of the model hessian B. Hence, a Gauss-Seidel sweep on the model function m can also be carried out 
in K parallel steps. 

The main characteristic of trap is the activity detection phase, which differs from the projected 
search in standard trust region methods (Burke et al, 1990). At every iteration, trap updates the 
current active-set by computing iterates zi,..., zk (Lines 4 to 8). This is the main novelty of trap, 
compared to existing two-phase techniques, and allows for different step-sizes qi, ..., ax per block of 
variables, which is relevant in a distributed framework, as the current active-set can be split among 
nodes and does not need to be computed centrally. In the trust region literature, the point 

0 := [zl,...,ZKy , (4) 

is often referred to as the Cauchy point. We keep this terminology in the remainder of the paper. It is 
clear from its formulation that TRAP allows one to compute Cauchy points via independent projected 
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searches on every node. Once the Cauchy points zi,... ,zk have been computed, they are used in 
the refinement step to compute a new iterate y that satisfies the requirements shown from Lines 9 
to 13. The last step consists in checking if the model decrease m{y) — m(x) is close enough to the 
variation in the objective L (Lines 16 to 27). In this case, the iterate is updated and the trust region 
radius A increased, otherwise the radius is shrunk and the iterate frozen. This operation requires a 
global exchange of information between nodes. 

In the remainder, the objective gradient VL(x) is denoted by g {x) and the objective hessian 
(a;) by H {x). The model function m is an approximation of the objective function L around the 
current iterate x. The quality of the approximation is controlled by the trust region, defined as the 
box 


M {x — A,x + A) , 


where A is the trust region radius. 

In the rest of the paper, we denote the Cauchy points by Zk or Zk (c^fc) without distinction, where 
are appropriately chosen step-sizes. More precisely, following Section 3 in (Burke et al, 1990), in trap, 
the block-coordinate step-sizes are chosen so that for all k G {1,... ,77}, the Cauchy points Zk 
satisfy 


^ — 1]5 1 X^k+\,K\ ) ^ ^ —1|: Xk^ + l ,X] ) 

^ r'o k'kkl (^Z^i^k—lji Xk^ ^|/c + l,iC|) : ) i (^) 

. \\Zk-Xk\\^ < y2A 

with i/Q G ]0,1[ and i >2 > 0, where z^i^k-i} stands for (^zj ,..., Zk-i)'^, along with the condition that 
there exists positive scalars v\ < V 2 , V 3 , vi and for all k G {1, ..., 77}, 

ake[v4,,V5] or a/c e [P3<5fc, i/s] (6) 

where the step-sizes dfc are such that one of the following conditions hold for every fcG{l,...,77}, 

— 1| ? Zk (o^fc) : + ^ Xn , Xk, ) (^) 

+ i /0 (Vfcm [zii^k-i^,Xk,xik+x^Ki) ,Zk (dfc) - Xk) , 


or 


\\zk{ak)-Xk\\^>i'iA , ( 8 ) 

Conditions (5) ensure that the step-sizes ak are small enough to enforce a sufficient decrease 
coordinate-wise, as well as containment within a scaled trust region. Conditions (6), (7) and (8) 
guarantee that the step-sizes ak do not become arbitrarily small. All conditions (5), (6) and (7) can 
be tested in parallel in each of the 77 groups of variables. In the next two paragraphs 3.2 and 3.3 of this 
section, the choice of step-sizes ak ensuring the sufficient decrease is clarified, as well as the distributed 
refinement step. In the next Section 4, the convergence properties of trap are analysed. Numerical 
examples are presented in Section 5. 
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3.2 Step-sizes Computation in the Activity Detection Phase 

At a given iteration of trap, the step-sizes Uk are computed by backtracking to ensure a sufficient de¬ 
crease at every block of variables and coordinate-wise containment in a scaled trust region as formalised 
by (5). It is worth noting that the coordinate-wise backtracking search can be run in parallel among the 
variables of group k, as they are decoupled from each other. As a result, there is one step-size per sub¬ 
variable Wi in group k. Yet, for simplicity, we write it as a single step-size a^- The reasoning of Section 
4 can be adapted accordingly. The following Lemma shows that a coordinate-wise step-size ak can be 
computed that ensures conditions (5), (6), (7) and (8) on every block of coordinates k G {1,... ,K}. 

Lemma 31. Assume that Assumption 35 holds. For all k G {!,..., 77}, an iterate Zk satisfying 
conditions (5), (6), (7) and (8) can be found after a finite number of backtracking iterations. 

Proof. Let /c G {!,..., K}. We first show that for a sufficiently small ttfc, conditions (5) are satisfied. By 
definition of the Cauchy point Zk, 

= argmin (Vkm(zii^k-ij,Xk,xik+i^Kj),z-Xk) + ^^\\z-Xk\\l , 
which implies that 


kna Xk: : ^k Xk^ 2 ^ W^k II 2 — ^ ? 

Hence, as uq G ]0, 1[, it follows that 

(Vfcm /yj ^ ^ Zk T 11'^^ II 2 — 

PO knl (^Z^i — , Xk^ ^fk+l.KJ ) 7 ^k 

However, from the descent Lemma, which can be applied since the model gradient is Lipschitz con¬ 
tinuous by Assumption 35, 

^ —+ —^|A:+l,iC]) T (VfeTU Xk^ i ^k Xk') 

T INfc ~ ®fc|l2 • 


By choosing 


ak < 


1 - iZQ 

B 


condition (5) is satisfied after a finite number of backtracking iterations. Denoting by the smallest 
integer such that requirement (5) is met, ak can be written 


ak = c®'' • 


where c G ]0,1[ and > 0. Then, condition (6) is satisfied with 1^4 = and 1/3 = c. 


□ 


Lemma 31 is very close to Theorem 4.2 in (More, J.J., 1988), but the argument regarding the 
existence of the step-sizes ak is different. 
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3.3 Distributed Computations in the Refinement Step 

In Algorithm 1, the objective gradient g (x) and model hessian B (x) are updated after every success¬ 
ful iteration. This task requires exchanges of variables between neighbouring nodes, as the objective 
is partially separable (Ass. 31). Node i only needs to store the sub-part of the objective function L 
that combines its variable Wi and the variables associated to its neighbours. However, the refinement 
step (line 9 to 13 in Algorithm 1), in which one obtains a fraction of the model decrease yielded by 
the Cauchy points zi,..., Zk, should also be computed in a distributed manner. As detailed next, 
this phase consists in solving the Newton problem on the subspace of free variables at the current 
iteration, which is defined as the set of free variables at the Cauchy points zi,... ,zk- In order to 
achieve a reasonable level of efficiency in the trust region procedure, this step is generally performed 
via the Steihaug-Toint CG, or sCG (Steihaug, T., 1983). The sCG algorithm is a CG procedure that is 
cut if a negative curvature direction is encountered or a problem bound is hit in the process. Another 
way of improving on the Cauchy point to obtain fast local convergence is the Dogleg strategy (No- 
cedal, J. and Wright, S., 2006). However, this technique requires the model hessian B to be positive 
definite (Nocedal, J. and Wright, S., 2006). This condition does not fit well with distributed computa¬ 
tions, as positive definiteness is typically enforced by means of BFGS updates, which are know for not 
preserving the sparsity structure of the objective without non-trivial modifications and assumptions 
(Yamashita, N., 2008). Compared to direct methods, iterative methods such as the sCG procedure 
have clear advantages in a distributed framework, for they do not require assembling the hessian 
matrix on a central node. Furthermore, their convergence speed can be enhanced via block-diagonal 
preconditioning, which is suitable for distributed computations. In the sequel, we briefly show how a 
significant level of distributed operations can be obtained in the sCG procedure, mainly due to the 
sparsity structure of the model hessian that matches the partial separability structure of the objective 
function. More details on distributed implementations of the CG algorithm can be found in numerous 
research papers (Verschoor, M. and Jalba, A.C., 2012; D’Azevedo, E. and Eijkhout, V. and Romine, 
C., 1993; Eei, Y. and Guodong, R. and Wang, B. and Wang, W., 2014). The sCG algorithm that is 
described next is a rearrangement of the standard sCG procedure following the idea of (D’Azevedo, 
E. and Eijkhout, V. and Romine, C., 1993). The two separate inner products that usually appear in 
the CG are grouped together at the same stage of the Algorithm. 

An important feature of the refinement step is the increase of the active set at every itera¬ 
tion. More precisely, in order to ensure finite detection of activity, the set of active constraints at 
the points yi,..., yx, obtained in the refinement phase, needs to contain the set of active constraints 
at the Cauchy points z\,... ,zk, as formalised at line 13 of Algorithm 1. This requirement is very 
easy to fulfil when 17 is a bound constraint set, as it just requires enforcing the constraint 

yk,i ^k,i: '^k }" ■ —k^j ^k,j j” 

for all groups fcG {l,...,Ar} in the trust region problem at the refinement step. 

Eor the convergence analysis that follows in Section 4, the refinement step needs to be modified 
compared to existing trust region techniques. Instead of solving the standard refinement problem 

minimise {g (x) ,p) + \{p,B {x) p) 

p 2 

s.t. < 72^ 

X + p G f2 

An {z) C An {x + p) , 
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in which the variables corresponding to indices of active constraints at the Cauchy point are fixed 
to zero, we solve a regularised version 

minimise {g {x) ,y - x) + ^ {y - x, B (x) {y - a;)) + ^ \\y - z\\l (9) 

s-t- Il 2 /-a;||„^ < 72 ^ 

As7 {z) c An {y) , 


where a G ]£, d[ with (t > 0, and 2 ; is the Cauchy point yielded by the procedure described in the 
previous paragraph 3.2. The regularisation coefficient a should not be chosen arbitrarily, as it may 
inhibit the fast local convergence properties of the Newton method. This point is made explicit in 
paragraph 4.3 of Section 4. The regularised trust region subproblem (9) can be equivalently written 


minimise {go- {x) ,p) + x (p, Ba {x)p) 

p 2, 

s.t. X + p G C 

IIpIL < 7241 

An {z) C An {x + p) , 


( 10 ) 


with 

ga {x) := g{x) - (t{z - x), B^ {x) := B {x) +'^I . (11) 

As in standard trust region methods, we solve the refinement subproblem (10) by means of CG it¬ 
erations, which can be distributed as a result of Assumption 36. In order to describe this stage in 
Algorithm 2, one needs to assume that 17 is a box constraint set. In the remainder, we denote by Z 
the matrix whose columns are an orthonormal basis of the subspace 

V {z) := {a; G R” : Xk) = 0, i G An^ (zfc), fc G { 1 , ..., K}} . 


Remark 33. It is worth noting that the requirement m{x) — m{y) > 71 {m(x) — with 71 < 1, 

is satisfied after any iteration of Algorithm 2, as the initial guess is the Cauchy point z and the sCG 
iterations ensure monotonic decrease of the regularised model (Theorem 2.1 in (Steihaug, T., 1983)). 

Remark 34. It is worth noting that the sparsity pattern of the reduced model hessian B„ has the same 
structure as the sparsity pattern of the model hessian B, as the selection matrix Z has a block-diagonal 
structure. Moreover, the partial separability structure of the objective matches the sparsity patterns of 
both the hessian and the reduced hessian. For notational convenience, Algorithm 2 is written in terms 
of variables xi,... ,xk, but it is effectively implementable in terms of variables wi ,..., wjv- The inner 
products (Lines 6 to 8 ) and updates (Lines 11 to If, lines 20 to 22) can be computed in parallel at 
every node, as well as the structured matrix-vector product (Line 5). 

In Algorithm 2, the reduced model hessian B can be evaluated when computing the product at fine 
5, which requires local exchanges of vectors between neighbouring nodes, since the sparsity pattern 
of B represents the coupling structure in the objective L. From a distributed implementation perspec¬ 
tive, the more costly parts of the refinement procedure 2 are at line 9 and line 16. These operations 
consist in summing up the inner products from all nodes and a minimum search over the step-sizes 
that ensure constraint satisfaction and containment in the trust region. They need to be performed on 
a central node that has access to all data from other nodes, or via a consensus algorithm. Therefore, 
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Algorithm 2 Distributed Safeguarded Conjugate Gradient (sCG) 

1: Input: reduced model hessian Ba '= BaZ^ reduced gradient g := Z'^ g, initial guess z := Z'^z 
2: Parameters: stopping tolerance e := J II 5 II 2 with ^ G ]0,1[ 

3: Initialise x, p, v, f, t and -Uprev via a standard sCG iteration using z, x, Z, Ba-, Ba- and g 
4: while u> and t > 0 do 

5: Compute structured matrix-vector product s ■<— B^t > Local communications 

6: for k = 1 ... K do > In parallel among K groups 

7: Compute and (ffc,Sfe} 

8 : end for 

9: u-(r- S ^ {'^k>^k) ■> Global summations 

10: Compute step-sizes /3 ■<— ^/ttprev and t 5 — 

11: for k = 1 ... K do > In parallel among K groups 

12: Update conjugate direction Pfc + (^Pk and $Vk 

13: Compute smallest step-size such that xj^ + aj^p^ hits a bound xj^ or the trust region boundary 

14: end for 

15: if t < 0 then > Negative curvature check 

16: Compute step-size a ■<— min {ai,..., ax} to hit boundary of B (a; — a; -h A) fl i? 

17: else 

18: Compute standard CG step-size a ■<— 

19: end if 

20: for k = 1... if do > In parallel among K groups 

21: Update iterate •<— -|- and residual ■<— 

22 : end for 

23: ’^prev ^^ 

24: end while 

25: Output: y z + Z{x — z) 



Fig. 1: Workflow at node j in terms of local computations, communications with the set of neighbours 
Mj and a central node. Note that we use the index j for a node, and not k, which corresponds to a 
group of nodes, in which computations are performed in parallel. Thus, the nodes in the set A/} are 
not in the same group as node j. Thick arrows represent communications involving vectors, whereas 
thin arrows stand for communications of scalars. Matrix Ej is defined at Eq. (15). 


lines 9 and 16 come with a communication cost, although the amount of transmitted data is very 
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small (one scalar per node). In the end, one should notice that the information that is required to be 
known globally by all nodes {1,..., Af} is fairly limited at every iteration of trap. It only consists 
of the trust region radius A and the step-sizes d and /? in the refinement step 2. Finally, at every 
iteration, all nodes need to be informed of the success or failure of the iteration so as to update or 
freeze their local variables. This is the result of the trust region test, which needs to be carried out 
on a central node. In Figure 1, we give a sketch of the workflow at a generic node j. One can notice 
that, in terms of local computations, trap behaves as a standard two-phase approach on every node. 


4 Convergence Analysis 

The analysis of trap that follows is along the lines of the convergence proof of trust region methods 
in (Burke et al, 1990), where the Cauchy point is computed via a projected search, which involves 
a sequence of evaluations of the model function on a central node. However, for trap, the fact that 
the Cauchy point is yielded by an distributed projected gradient step on the model function requires 
some modifications in the analysis. Namely, the lower bound on the decrease in the model and the 
upper bound on criticality at the Cauchy point are expressed in a rather different way. However, the 
arguments behind the global convergence proof are essentially the same as in (Burke et al, 1990). 

In this section, for theoretical purposes only, another first-order criticality measure different from 
(2) is used. We utilise the condition that a;* G C is a first-order critical point if the projected gradient 
at X* is zero. 


Vi7L(x*)=0, ( 12 ) 

where, given a; G C, the projected gradient is defined as 

VnL{x):=Pr^(^){-g{x)) . 

Discussions on this first-order criticality measure can be found in (Conn, A.R. and Gould, N.I.M. and 
Toint, P.L., 2000). It is equivalent to the standard optimality condition 

(<7 (x*) , X — X*) > 0, for all a; G C . 

It follows from Moreau’s decomposition that a point x* satisfying (12) automatically satisfies (2). Con¬ 
sequently, it is perfectly valid to use (2) for the convergence analysis of trap. 


4.1 Global Convergence to First-order Critical Points 

We start with an estimate of the block-coordinate model decrease provided by the Cauchy points 2^, fc G 
{1,..., A}, of Algorithm 1. For this purpose, we define for all fc G {1,..., A}, 

mk{x) ,xik+i^Ki) , (13) 

where x' G R"'”. This corresponds to the model function evaluated at x' with the block-coordinates 1 
to fe — 1 being fixed to the associated Cauchy points z\,..., Zk-i and the block-coordinates k-\-l to K 
having values Xk+i, ■ ■ ■, Xk- Note that by definition of the function m^, 

rrik (zk) = mfc+i (xk+i) , 


for all fc G A - 1}. 
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Lemma 41. There exists a constant x > 0 such that, for all k £ {I,..., K}, 

1 II Xk II2 


/ \ / \ \ W^k Xk 2 

ruk [Xk) - ruk [Zk] > X -^ mm 

Ctk 


’^ + \\B[x) 


(Xk 


(14) 


Proof. The proof goes along the same lines as the one of Theorem 4.3 in (More, J.J., 1988). Yet, some 
arguments differ, due to the alternating projections. We first assume that condition (5) is satisfied 
with 


ak > V4 ■ 

Using the basic property of the projection onto a closed convex set, we obtain 


Wfc [xk) - rUk [zk) > izQUA 
We then consider the second case when 


\^k ^fc||2 


ak > nsUk ■ 

The first possibility is then 

rrik [zk (ak)) - rUk (xk) > izo {^rnk (xk) , Zk (ak) - Xk) ■ 

However, by definition of the model function in Eq. (13), the left-hand side term in the above inequality 
is equal to 

(gk (x) ,Zk - Xk) + {zk - Xk,EkB (x) El (zk - Xk)) 

T — 1 ] — 1 |? — 1 ] E (x) Ek (zk Xk)') 

= ]^{zk- Xk,EkB (x) El (zk - Xk)) + (Vmk (xk ), Zk - Xk) , 

where, given /cG {I,..., K}, the matrix Ek G is such that for i G {1, ..., n^}, 

Ek(i,ni + ... + Uk-i +i) = l , (15) 

and all other entries are zero. This yields, by the Cauchy-Schwarz inequality 

lISfc - a;fc||2 > - (1 - no) (VrUk (xk), Zk - Xk) 

> ^ ll^fc - Xk\\\ 
ak 

Hence, 

_ 2 (I-no) 

“'"-! + ||B(a;)||2 

The second possibility is 

ll^fc -ajfcll,^ > . 
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For this case, (More, J.J., 1988) provides the lower bound 

\\Zk - Xk\\2 > ■ 


Finally, inequality (14) holds with 

X := izo min{i/4, 2 (1 - vq) , 


□ 


From Lemma 41, an estimate of the decrease in the model provided by the Cauchy point a is 
derived. 

Corollary 41 (Sufhcient decrease). The following inequality holds 


K 

m {x) - m (z) > 

k=l 


ll^fc Xk \\2 

Oik 


min 



1 II Zk Xk 112 1 

1 + Il^(a;)ll2 / 


(16) 


Proof. This is a direct consequence of Lemma 41 above, as 

K 

m{x) - m (z) = '^mk {xk) - ruk {zk) 

k=l 


from the definition of rUk in Eq. (13). □ 

In a similar manner to (Burke et al, 1990), the level of criticality reached by the Cauchy point z 
is measured by the norm of the projected gradient of the objective, which can be upper bounded by 
the difference between the current iterate x and the Cauchy point z. 

Lemma 42 (Relative error condition). The folloujing inequality holds 

\\S7aLiz)\\^<K\\Bix)\\^\\z-x\\2 + f2C^^^^^ + \\9k{z)-gk(x)\\2) • (17) 

Proof. From the definition of Zk as the projection of 

Xk - akVmk {xk) 

onto the closed convex set Qk, there exists Vk G (z^k) such that 

0 = Vk+ Vnik {xk) + — - — . 

Oik 

Hence, 

lli^fc + Qk ( 2)112 < llflfc {z) - gk (a ;)||2 + ||5 (a :)||2 ||z - a ;||2 + —— 

(Xk 

However, 

(-5fc ( 2 ))+Sfc ( 2 )||^ < \\vkPgk{z )\\2 , 
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and by Moreau’s decomposition theorem, 

-gk {z) = {-Qk {z)) + Pra, (zc (-5fc {z)) . 

Thus, 

\\Pr,,M i-9k W)|| < hk {z) - gk {x)\\^ + \\B{x)\\^ 11^ - rrll^ + 

11 “ 112 ak 

As the sets {flk}^^i are closed and convex, 

Tn {z) = Tbi (^i) X ... X Tq^ (zk) ■ 


Subsequently, 

K 

\\^aL{z)\\^<J2\\PT,,i.,){-gk (z)) 

k=l 


and inequality (17) follows. 


□ 


Based on the estimate of the model decrease (16) and the relative error bound (17) at the Cauchy 
point z, one can follow the standard proof mechanism of trust region methods quite closely (Burke 
et al, 1990). Most of the steps are proven by contradiction, assuming that criticality is not reached. The 
nature of the model decrease (16) is well-suited to this type of reasoning. Hence, most of the ideas of 
(Burke et al, 1990) can be adapted to our setting. 

Lemma 43. If Assumptions 32, 34 and 35 are satisfied, then the sequence of iterates yielded by 
Algorithm 1 satisfies that for all k G {1,..., K}, 


. „ \\Zk Xk\\2 
limmt- 


= 0 , 


(18) 


Proof. For the sake of contradiction, assume that there exists a block index fco G {1,..., K} and e > 0 
such that 


14 




> e 


for all iteration indices I > 1. Using Corollary 41, the standard proof mechanism of trust region 
methods (Burke et al, 1990) can be easily adapted to obtain (18). □ 


We are now ready to state the main Theorem of this section. It is claimed that all limit points of 
the sequence {a;*} generated by trap are critical points of (1). 

Theorem 41 (Limit points are critical points). Assume that Assumptions 32, 34 and 35 hold. If x* 
is a limit point of {a;*}, then there exists a subsequence {h} such that 


lim 

—f + OO 




= 0 


( 19 ) 


Moreover, VqL (a;*) = 0, meaning that x* is a critical point of L + lq. 
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Proof. Let } be a subsequence of such that —)■ a:*. If for all k G {1,..., 



a 


h 

k 


( 20 ) 


then the proof is complete, via Lemma 42 and the fact that the step-sizes are upper bounded 
by 1 / 5 . In order to show (20), given e > 0 one can assume that there exists ko G If} such 

that for all i > 1, |hfco“‘^'=olL/“L), ^ One can then easily combine the arguments in the proof of 
Theorem 5.4 in (Burke et al, 1990) with Corollary 41 and Lemma 42 in order to obtain (19). □ 


Theorem 41 above proves that all limit points of the sequence {a;*} generated by trap are critical 
points. It does not actually claim convergence of {a;*} to a single critical point. However, such a result 
can be obtained under standard regularity assumptions (Nocedal, J. and Wright, S., 2006), which 
ensure that a critical point is an isolated local minimum. 

Assumption 41 (Strong second-order optimality condition). The sequence {a;*} yielded by trap has 
a non-degenerate limit point x* such that for all v G Afa (a;*)^, where 


AAfi (a;*)■'■:= {d G R” : \lw e Mn {x*) , (w,t)=0} , 


( 21 ) 


one has 


{v,H{x*)v) > n\\v\\l 


( 22 ) 


where k > O. 

Theorem 42 (Convergence to first-order critical points). If Assumptions 35, 32, 34 and 41 are 
fulfilled, then the sequence {a;*} generated by trap converges to a non-degenerate critical point x* 
of L -\- lq. 

Proof. This is an immediate consequence of Corollary 6.7 in (Burke et al, 1990). □ 


4.2 Active-set Identification 

In most of the trust region algorithms for constrained optimisation, the Cauchy point acts as a predic¬ 
tor of the set of active constraints at a critical point. Therefore, a desirable feature of the novel Cauchy 
point computation in TRAP is finite detection of activity, meaning that the active set at the limit point 
is identified after a finite number of iterations. In this paragraph, we show that trap is equivalent to 
the standard projected search in terms of identifying the active set at the critical point a;* defined in 
Theorem 42. 

Lemma 44. Given a face T of 12, there exists faces iFi,..., Tk of f2\,, Qk respectively, such 
that J- = J-\ X ... X J'K ■ 

Remark 41. Given a point x & Q, there exists a face T of f2 such that x G ri(J^). The normal 
cone to Q at x is the cone generated by the normal vectors to the active constraints at x. As the 
set of active constraints is constant on the relative interior of a face, one can write without distinc¬ 
tion Mn {x) or J\f (T). 
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The following Lemma is similar in nature to Lemma 7.1 in (Burke et al, 1990), yet with an adap¬ 
tation in order to account for the novel way of computing the Cauchy point. In particular, it is only 
valid for a sufficiently high iteration count, contrary to Lemma 7.1 of (Burke et al, 1990), which can be 
written independently of the iteration count. This is essentially due to the fact that the Cauchy point 
is computed via an alternating projected search, contrary to (Burke et al, 1990), where a centralised 
projected search is performed. 

Lemma 45. Assume that Assumptions 35, 32, 34 and 4 I hold. Let x* be a non-degenerate critical 
point of (1) that belongs to the relative interior of a face T* of Q. Let ^6 faces of 

such that T* = Tf x ... x and thus x^. G ri {iFf), for all k G {1,..., K}. 

Assume that x^ ^ x*. For I large enough, for all k G {1,... ,K'\ and all > 0, there exists Ck > 0 
such that 

xi e B {x*k,ek) => Pn,, (xi - tkVmk (xi'^'^ , 

for all tk G ]0, a^]. 

Proof. Similarly to the proof of Lemma 7.1 in (Burke et al, 1990), the idea is to show that there exists 
a neighbourhood of xl such that if x\, lies in this neighbourhood, then 

xi - akVmk (xi'j en{pf-\-Af (Pf)) . 

Lemma 45 then follows by using the properties of the projection operator onto a closed convex set 
and Theorem 2.3 in (Burke et al, 1990). 

For simplicity, we prove the above relation for fe = 2. It can be trivially extended to all indices k 
in {3,..., K}. Let Q !2 > 0 and I > 1. 

X 2 - a2Vm2 (x 2 'j = X 2 - a- 2 g 2 j - a 2 E 2 B j Ef ^z{ - xi'j , 

where the matrix Ek is defined in (15). As x* is non-degenerate, 

X* — a 2 g (a;*) G ri (P*) -h ri (A/" (A"*)) . 

However, as the sets {Pk'\k=i convex, one has (Rockafellar, R.T. and Wets, R.J.-B., 2009) 
ri (A"*) = ri (A"! ) X ... ri [Pk) and M (P*) = M (^" 1 ) X ... X A/" {Pk) ■ 

Hence, 

X 2 - Ct 2 g 2 {x*) G ri [P 2 ) + ri (A/ (P 2 )} = int (P 2 -f-Af (P 2 )) , 

by Theorem 2.3 in (Burke et al, 1990). By continuity of the objective gradient g, there exists 52 > 0 
such that 

I a;*—a:* I <52 a ;2 — 0252 (^a;* j G int (^"2 + .^ (-^ 2 )) • 

However, as shown beforehand (Lemma 43), 

lim \\z{ — a;iII = 0 . 

/—>- + oo II II 2 
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Moreover, E 2 B (a;*) Ej is bounded above (Ass. 35), subsequently for I large enough, 

X 2 - a 2 Vm 2 G int (E 2 + Af {E 2 )) C ri [E 2 + M {E 2 )) , 

by Theorem 2.3 in (Burke et al, 1990). Then, Lemma 45 follows by properly choosing the radii Ck so 
that X;f=i tfc = (min{4}f=i) • □ 

We have just shown that, for a large enough iteration count I, if the primal iterate is sufficiently 
close to the critical point x* and on the same face E*, then the set of active constraints at the Cauchy 
point z’’ is the same as the set of active constraints at x*. 

Theorem 43. If Assumptions 35, 32, 34 and 4 I are fulfilled, then the following holds 

lim I fa;*') || = 0 . 

f+00 II V / II2 

Moreover, there exists lo such that for all I > lo. 

An = An {x*) . 

Proof. The reasoning of the proof of Theorem 7.2 in (Burke et al, 1990) can be applied using Lemma 
45 and line 13 in Algorithm 1. The first step is to show that the Cauchy point 2 identifies the optimal 
active set after a finite number of iterations. This is guaranteed by Theorem 2.2 in (Burke et al, 1990), 
since Vr 2 L(z*) —> 0 by Theorem 41, and the sequence {a;*} converges to a non-degenerate critical 
point by Theorem 42. Lemma 45 is used to show that if a;* is close enough to x*, then the Cauchy 
point 0 * remains in the relative interior of the same face, and thus the active constraints do not change 
after some point. □ 

Theorem 43 shows that the optimal active set is identified after a finite number of iterations, which 
corresponds to the behaviour of the gradient projection in standard trust region methods. This fact is 
crucial for the local convergence analysis of the sequence {a;*}, as fast local convergence rate cannot 
be obtained if the dynamics of the active constraints does not settle down. 


4.3 Local Convergence Rate 

In this paragraph, we show that the local convergence rate of the sequence {a;*} generated by trap 
is almost Q-superlinear, in the case where a Newton model is approximately minimised at every trust 
region iteration, that is 


in model (3). Similarly to (11), one can define 

Ha-.= H+'^I . (23) 

To establish fast local convergence, a key step is to prove that the trust region radius is ultimately 
bounded away from zero. It turns out that the regularisation of the trust region problem (9) plays an 
important role in this proof. As shown in the next Lemma 46, after a large enough number of iterations, 
the trust region radius does not interfere with the iterates and an inexact Newton step is always taken 
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at the refinement stage (Line 9 to 13), implying fast local convergence depending on the level of accu¬ 
racy in the computation of the Newton direction. However, Theorem 7.4 in (Burke et al, 1990) cannot 
be applied here, since due to the alternating gradient projections, the model decrease at the Cauchy 
point cannot be expressed in terms of the projected gradient on the active face at the critical point. 

Lemma 46. If Assumptions 35, 32, 34 and 4 I are fulfilled, then there exists an index h > I and A* > 
0 such that for all I > h, > A*. 

Proof. The idea is to show that the ratio p converges to one, which implies that all iterations are 
ultimately successful, and subsequently, by the mechanism of Algorithm 1, the trust region radius is 
bounded away from zero asymptotically. For all / > 1, 

L{y'^) - L{x'') - {g{x'') ,y^ - x'-) - - x\H 

\p ~ ^ ~ TTi m ■ 

I 1 m (a;‘) — m (y‘) 


However, 


and 


i f i\ I f i\ I f I f , I ( I ( 

m lx I—m ly \ = m (a; I—m Iz j + m Iz I—m ly j 


^ ^ II * * ^ ‘ 


> 


> 


2 
min 


in{p,a} 
2 

min { 77 , a} 


l l\ 


l l 


z - X \\y - z 
2 


, ,, i i i i 

max < \\z — X , y — z 

I II 2 


Hence, 


\\p <\\y - z\\ + 2 - x 


< 2 max I y^ — z^ , z^ — x^ \ 

L 2 2 ) 


i I i 

m [x \ — m 


(^0 


> 


iin {^,4 II 012 


P 


Moreover, using the mean-value theorem, one obtains that the numerator in (24) is smaller than 

|2 

•) 

\2 


1 ,/ /] 
2 '*/’ p 


where 


Subsequently, we have 


'ij}' := sup \\h (x‘ + rp^) — H (x’^') I 
Tg[o.i] II ^ '' ^ ^1 


I * 1 1 ^ ^ A 

P - 1 < 

I I mm|77,CT| 


(25) 
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and the result follows by showing that p* converges to zero. Take I > Iq, where lo is as in Theorem 
43. Thus, p* G Af However, from the model decrease, one obtains 

^(p\H(ir')p')<(-p(ai'),p') . 

From Theorem 42, the sequence {a;*} converges to x*, which satisfies the strong second-order opti¬ 
mality condition 41. Hence, by continuity of the hessian and the fact that Af^ (x^) = An (x*), 
one can claim that there exists h > lo such that for all I > h, for all v G JVn 

(v,h{x‘'^ > k\\v\\1 . 

Thus, by Moreau’s decomposition, it follows that 

^ ||p'||^ < {-g (a;')) + PMai^‘) {-g (a;')) 

<||j’r„(.o(-5M)ILIk1L ’ 

since p* G Finally, p* converges to zero, as a consequence of Lemma 42 and the fact 

that IIz* — a:* II 2 converges to 0, by Lemma 42 and the fact that the step-sizes ak are upper bounded 
for fc G □ 

The refinement step in trap actually consists of a truncated Newton method, in which the Newton 
direction is generated by an iterative procedure, namely the distributed sCG described in Algorithm 
2. The Newton iterations terminate when the residual s is below a tolerance that depends on the 
norm of the projected gradient at the current iteration. In Algorithm 2, the stopping condition is set 
so that at every iteration / > 1, there exists G ]0,1[ satisfying 

||z'(z')" (p,, (a:*) (a;')p*)|[ . ( 26 ) 

The local convergence rate of the sequence {a?*} generated by trap is controlled by the sequences 
and {cr*}, as shown in the following Theorem. 

Theorem 44 (Local linear convergence). Assume that the direction p yielded by Algorithm 2 satisfies 
( 26 ) if IIpII,^ < 7*A and An (x) = An (x + p), given 7* G ]0,72[. Under Assumptions 35, 32, 3f and 
41, for a small enough d, the sequence {2^*} generated by trap converges Q-linearly to x* if ff* < 1 
is small enough, where 


C ■■= limsup^' . 

I —^ + 00 

V f* = 0; Q-linear convergence ratio can be made arbitrarily small by properly choosing d, resulting 
in almost Q-superlinear convergence. 

Proof. Throughout the proof, we assume that I is large enough so that the active-set is An (x*) 
and that p‘ satisfies condition (26). This is ensured by Lemma 46 and Theorem 43, as the sequence 
{p*} converges to zero. Thus, we can write Z* = Z*. The orthogonal projection onto the subspace 
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Af is represented by the matrix Z* {Z*Y . A first-order development yields a positive sequence 

•[ 5*} converging to zero such that 


jz* {Z*y g 1^ < ||z* {Z*y (g + H pj ||^ -b Y ||p ||^ 

|z* {Z*y g(^y)\\^ + \\z* (Z*y (g,. (x‘)+H^. ("^0^01 

+ ^ I 


< 


2 S‘ 


Z* (Z*)' 


p , I I 
-+z -X 


+ cr 


1 \\z* (z*y {y - y)\ 

\\z*{z*yg{xyL 


Z* {Z*yg(x^) 


where the second inequality follows from the last inequality in Lemma 46, and the definition of go- 
in Eq. (11) and Ha in Eq. (23). However, from the computation of the Cauchy point described in 
paragraph 3.2 and Assumption 35, the term 

||z* iz*y{y-y)y 
jjz* {z*yg{y)\\^ 

is bounded by a constant C > 0. Hence, 

\\z* iz*y g {y+yy ^25' 


1 


\\zyz*ygiy)\\--yr^^ + ^ 

Moreover, a first-order development provides us with a constant T > 0 such that 

||z* {Z*y g(x^'jj^ < (^B + r) ||a:* . 

There also exists a positive sequence •[ e*} converging to zero such that 


y{Z*yg(x^+^^ Z* {Z*y H {x*) (x^+^ - x*^ 


I Z + 1 * 

— e\\x — X 


However, since — a;* lies in Af {x*y, Z* {Z*y — a;*) = a;*"*"^ — a;*. Thus, by Assumption (22), 


jz* {Z*yVL (a :*^^)||2 > Y-e) ||a;*+^ - a;* 

which implies that, for I large enough, there exists e G ]0,«;[ such that 

jz* {Z*y g (x^+^'^j^ >y-e) ||a:*+^ -a;*||_ 


Einally, 


a;*+^-a;*|| B + Y 

- - ^ -IT 

\\x^ — x *\\2 ~ K — e 



+ +d 



which yields the result. 


□ 



An Alternating Trust Region Algorithm for Distributed Linearly Constrained NLPs 


23 


5 Numerical Examples 

The optimal AC power flow constitutes a challenging class of nonconvex problems for benchmarking 
optimisation algorithms and software. It has been used very recently in the testing of a novel adaptive 
augmented Lagrangian technique (Curtis, F.E. and Gould, N.I.M. and Jiang, H. and Robinson, D.P., 
2014. To appear in Optimization Methods and Software). The power flow equations form a set of non¬ 
linear coupling constraints over a network. Some distributed optimisation strategies have already been 
explored for computing OPF solutions, either based on convex relaxations (Lam, A.Y.S. and Zhang, 
B. and Tse, D.N., 2012) or nonconvex heuristics (Kim, B.H. and Baldick, R., 1997). As the convex 
relaxation may fail in a significant number of cases (Bukhsh, W.A. and Grothey, A. and McKinnon, 
K.I.M. and Trodden, P.A., 2013), it is also relevant to explore distributed strategies for solving the 
OPF in its general nonconvex formulation. Naturally, all that we can hope for with this approach is a 
local minimum of the OPF problem. Algorithm 1 is tested on the augmented Lagrangian subproblems 
obtained via a polar coordinates formulation of the OPF equations, as well as rectangular coordinates 
formulations. Our TRAP algorithm is run as an inner solver inside a standard augmented Lagrangian 
loop (Bertsekas, D.P., 1982) and in the more sophisticated Lancelot dual loop (Conn, A. and Gould, 
N.I.M. and Toint, P.L., 1991). More precisely, if the OPF problem is written in the following form 

minimise / (a;) (27) 

X 

s.t. g {x) = 0 
X & X , 

where A is a bound constraint set, an augmented Lagrangian loop consists in computing an approx¬ 
imate critical point of the auxiliary program 


minimise Lg{x, g) := f{x) + (^g + g{x) (28) 

with g a dual variable associated to the power flow constraints and g > 0 a penalty parameter, which 
are both updated after a finite sequence of primal iterations in (28). Using the standard first-order 
dual update formula, only local convergence of the dual sequence can be proven (Bertsekas, D.P., 
1982). On the contrary, in the Lancelot outer loop, the dual variable g and the penalty parameter g 
are updated according to the level of satisfaction of the power flow (equality) constraints, resulting in 
global convergence of the dual sequence (Conn, A. and Gould, N.I.M. and Toint, P.L., 1991). In order 
to test TRAP, we use it to compute approximate critical points of the subproblems (27), which are of 
the form (1). The rationale behind choosing Lancelot instead of a standard augmented Lagrangian 
method as the outer loop is that Lancelot interrupts the inner iterations at an early stage, based on 
a KKT tolerance that is updated at every dual iteration. Hence, it does not allow one to really measure 
the absolute performance of trap, although it is likely more efficient than a standard augmented La¬ 
grangian for computing a solution of the OPF program. Thus, for all cases presented next, we provide 
the results of the combination of trap with a basic augmented Lagrangian and Lancelot. The aug¬ 
mented Lagrangian loop is utilised to show the performance of trap as a bound-constrained solver, 
whereas Lancelot is expected to provide better overall performance. All results are compared to the 
solution yielded by the nonlinear interior-point solver IPOPT (Wachter, A. and Biegler, L.T., 2006) 
with the sparse linear solver ma27. Finally, it is important to stress that the results presented in 
this Section are obtained from a preliminary matlab implementation, which is designed to handle 
small-scale problems. The design of a fully distributed software would involve substantial development 
and testing, and is thus beyond the scope of this paper. 
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5.1 AC Optimal Power Flow in Polar Coordinates 
We consider the AC-OPF problem in polar coordinates 


minimise ^ eg + cfp^ + C 2 (pf'j 


g^G 


s.t. 


^ pf = Pd + Pbb' + Gb vl 

g^Gb d^X>i^ b' 

^ <h = ^ Qd + ^ qtb' - Bb vl 


g&Gb 




b'gSb 


Pbb' = Gbbvl + {Gbb' cos {6b - Ob') + Bbb' sin {Ob -Of)) VbVf 
qtb' = -Bbbvl + {Gbb' sin {Ob - Ob') - Bbb' cos {Ob - Of)) VbVf 


(ptb'^j + {qbb')'^ + Sbf — (^Sbb''^ 


L ^ ^ U 

Vb <Vb< Vb 

L ^ G ^ U 
P <Pg <P 

L ^ G ^ U 

q <qg<q 

Sbf > 0 , 


(29) 


which corresponds to the minimisation of the overall generation cost, subject to power balance con¬ 
straints at every bus b and power flow constraints on every line bb' of the network, where Q denotes 
the set of generators and Qb is the set of generating units connected to bus b. The variables p® and qf 
are the active and reactive power output at generator g. The set of loads connected to bus b is denoted 
by Vb- The parameters and are the demand active and reactive power at load unit d. The 
letter Bb represents the set of buses connected to bus b. Variables ptf and qtf are the active and 
reactive power flow through line bb'. Variables Vb and Ob denote the voltage magnitude and voltage 
angle at bus b. Constants vt, vjt are lower and upper bounds on the voltage magnitude at bus b. Con¬ 
stants , p^, q^ and q^ are lower and upper bounds on the active and reactive power generation. It 
is worth noting that a slack variable Sbf has been added at every line bb' in order to turn the usual 
inequality constraint on the power flow through line bb' into an equality constraint. The derivation of 
the optimal power flow problem in polar form can be found in (Zhu, J., 2009). 

As a simple numerical test example for trap, we consider a particular instance of NLP (29) on 
the 9-bus transmission network shown in Fig. 2. As in (28), the augmented Lagrangian subproblem is 
obtained by relaxing the equality constraints associated with buses and lines in (29). The bound con¬ 
straints, which can be easily dealt with via projection, remain unchanged. One should notice that NLP 
(29) has partially separable constraints and objective, so that Lancelot could efficiently deal with 
it, yet in a purely centralised manner. In some sense, running trap in a Lancelot outer loop can be 
seen as a first step towards a distributed implementation of Lancelot for solving the AC-OPF. It 
is worth noting that the dual updates only require exchange of information between neighbouring 
nodes and lines. However, each Lancelot dual update requires a central communication, as the norm 
of the power flow constraints need to be compared with a running tolerance (Conn, A. and Gould, 
N.I.M. and Toint, P.L., 1991). For the 9-bus example in Fig. 2, the Cauchy search of trap on the 
augmented Lagrangian subproblem (28) can be carried out in five parallel steps. This can be observed 
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Fig. 2: The 9-bus transmission network from http://www.maths.ed.ac.uk/optenergy/LocalOpt/. 


by introducing local variables for every bus 6 G {1,..., 9}, 

Xb-={vb,0by , 


and for every line 

bb' G { {1, 4} , {4, 5} , {4,9} , {8, 9} , {2, 8} , {7, 8} , {6, 7} , {3,6} , {5,6 } } , 
with the line variable ybb' being defined as 

Vbb' • i_Pbb'7 Qbb': ^bb'') 

The line variables ybb' can be first updated in three parallel steps, which corresponds to 

{y{2,8}7y{6,7},y{4,9}} 7 {2/{7.8},y{3,6},y{4.5}} , {y{8,9},y{5.6}j2/{l,4}} • 

Then, the subset 


{a;i,a;2,a;3,a;5,a;7. Jig} 

can be updated, followed by the subset 

{xa,xq,X8,} ■ 

As a result, backtracking iterations can be run in parallel at the nodes associated with each line and 
bus. If a standard trust region Newton method would be applied, the projected search would have to 
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be computed on the same central node without a bound on the number iterations. Thus, the activity 
detection phase of trap allows one to reduce the number of global communications involved in the 
whole procedure. The results obtained via a basic augmented Lagrangian loop and a Lancelot outer 
loop are presented in Tables 1 and 2 below. The data is taken from the archive http: //www. maths. 
ed.ac.uk/optenergy/LocalOpt/. In all Tables of this Section, the first column corresponds to the 
index of the dual iteration, the second column to the number of iterations in the main loop of trap 
at the current outer step, the third column to the total number of sCG iterations at the current outer 
step, the fourth column to the level of KKT satisfaction obtained at each outer iteration, and the fifth 
column is the two-norm of the power flow equality constraints at a given dual iteration. To obtain 


Outer iter, 
count 

inner it. 

cum. sCG 

Inner KKT 

PF eq. constr. 

1 

79 

388 

2.01 • 10-'' 

0.530 

2 

2 

40 

2.71 ■ 10-1° 

0.530 

3 

300 

2215 

2.39 ■ 10-2 

0.292 

4 

101 

2190 

6.50 ■ 10-1 

6.56 ■ 10-3 

5 

123 

2873 

2.10 ■ 10-3 

5.02 ■ 10-s 

6 

56 

1194 

4.14 ■ 10-2 

1.11 ■ 10-1° 


Table 1: Results for the 9-bus AC-OPF (Fig. 2) using a standard augmented Lagrangian outer loop 
and TRAP as primal solver. Note that the cumulative number of CG iterations is relatively high, since 
the refinement stage was not preconditioned. 


Outer iter. 

count 

7 ^ inner it. 

# cum. sCG 

Inner KKT 

PF eq. constr. 

1 

37 

257 

7.29 

■ 10-2 

0.530 

2 

5 

25 

1.01 

• 10-2 

0.530 

3 

6 

71 

3.23 

■ 10-5 

0.530 

4 

100 

1330 

8.30 

■ 10-3 

4.33 ■ 10-2 

5 

100 

1239 

1.80 

■ 10-3 

2.53 ■ 10-3 

6 

100 

2269 

4.33 

■ 10-2 

2.69 ■ 10-5 

7 

64 

1541 

3.2 

10-3 

1.64- 10-3 


Table 2: Results for the 9-bus AG-OPF (Fig. 2) using a Lancelot outer loop and trap as primal 
solver. Note that the cumulative number of GG iterations is relatively high, since no preconditioner 
was applied in the refinement step. 


the results presented in Tables 1 and 2, the regularisation parameter a in the refinement stage 2 is set 
to 1 • 10“^°. For Table 1, the maximum number of iterations in the inner loop (trap) is fixed to 300 
and the stopping tolerance on the level of satisfaction of the KKT conditions to 1 • 10“^. For Table 2 
(Lancelot), the maximum number of inner iterations is set to 100 for the same stopping tolerance on 
the KKT conditions. In Algorithm 2, a block-diagonal preconditioner is applied. It is worth noting that 
the distributed implementation of Algorithm 2 is not affected by such a change. To obtain the results of 
Table 1, the initial penalty parameter q is set to 10 and is multiplied by 30 at each outer iteration. In the 
LANCELOT loop, it is multiplied by 100. In the end, an objective value of 2733.55 up to feasibility 1.64- 
10“® of the power flow constraints is obtained, whereas the interior-point solver ipopt, provided 
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with the same primal-dual initial guess, yields an objective value of 2733.5 up to feasibility 2.23 • 
10“^^. From Table 1, one can observe that a very tight KKT satisfaction can be obtained with 
TRAP. From the figures of Tables 1 and 2, one can extrapolate that Lancelot would perform better 
in terms of computational time (6732 sCG iterations in total) than a basic augmented Lagrangian outer 
loop (8900 sCG iterations in total), yet with a worse satisfaction of the power flow constraints (1.64 • 
10~® against 1.11 • 10~^°). Finally, one should mention that over a set of hundred random initial 
guesses, trap was able to find a solution satisfying the power flow constraints up to 1 • 10“^ in all 
cases, whereas ipopt failed in approximately half of the test cases, yielding a point of local infeasibility. 


5.2 AG Optimal Power Flow on Distribution Networks 

Algorithm 1 is then applied to solve two AG-OPF problems in rectangular coordinates on distribu¬ 
tion networks. Both 47-bus and 56-bus networks are taken from (Gan, L. and Li, N. and Topcu, 
U. and Low, S.H., 2014). Our results are compared against the nonlinear interior-point solver ipopt 
(Wachter, A. and Biegler, L.T., 2006), which is not amenable to a fully distributed implementation, 
and the SOGP relaxation proposed by (Gan, L. and Li, N. and Topcu, U. and Low, S.H., 2014), which 
may be distributed (as convex) but fails in some cases, as shown next. It is worth noting that any 
distribution network is a tree, so a minimum colouring scheme consists of two colours, resulting in 4 
parallel steps for the activity detection in trap. 


5.2.1 On the 56-bus AC-OPF 

On the 56-bus AG-OPF, an objective value of 233.9 is obtained with feasibility 8.00 • 10“^, whereas 
the nonlinear solver ipopt yields an objective value of 233.9 with feasibility 5.19 • 10“^ for the same 
initial primal-dual guess. 

In order to increase the efficiency of trap, following a standard recipe, we build a block-diagonal 
preconditioner from the hessian of the augmented Lagrangian by extracting block-diagonal elements 
corresponding to buses and lines. Thus, constructing and using the preconditioner can be done in 
parallel and does not affect the distributed nature of trap. In Fig. 3, the satisfaction of the KKT 
conditions for the bound constrained problem (28) is plotted for a preconditioned refinement phase 
and non-preconditioned one. One can conclude from Fig. 3 that preconditioning the refinement phase 
does not only affect the number of iterations of the sGG Algorithm 2 (Fig. 6), but also the perfor¬ 
mance of the main loop of trap. From a distributed perspective, it is very appealing, for it leads 
to a strong decrease in the overall number of global communications. Finally, from Fig. 3, it appears 
that TRAP and a centralised trust region method (with centralised projected search) are equivalent in 
terms of convergence speed. From Fig. 4, trap proves very efficient at identifying the optimal active 
set in a few iterations (more than 10 constraints enter the active-set in the first four iterations and 
about 20 constraints are dropped in the following two iterations), which is a proof of concept for the 
analysis of Section 4. Alternating gradient projections appear to be as efficient as a projected search 
for identifying an optimal active-set, although the iterates travel on different faces, as shown in Fig. 
4. In Fig. 5, the power flow constraints are evaluated after a run of trap on program (28). The dual 
variables and penalty coefficient are updated at each outer iteration. Overall, the coupling of trap 
with the augmented Lagrangian appears to be successful and provides similar performance to the 
coupling with a centralised trust region algorithm. 

Tables 3 and 4 are obtained with an initial penalty coefficient p = 10 and a multiplicative coefficient 
of 20. 
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Fig. 3: KKT satisfaction vs iteration count in the fourth Lancelot subproblem formed on the AC- 
OPF with 56 buses. When using a centralised projected search as activity detector (dotted grey) and 
TRAP (full black). Curves obtained with a preconditioned sCG are highlighted with triangle markers. 



Fig. 4: Active-set history in the first Lancelot iteration for the 56-bus AC-OPF. Activity detection 
in TRAP: TRAP (full black), centralised projected search (dashed grey with triangles). 


5.2.2 On the 47-bus AC-OPF 

On the 47-bus AC-OPF, a generating unit was plugged at node 12 (bottom of the tree) and the load at 
the substation was decreased to 3 pu. On this modified problem, the SOCP relaxation provides a solu¬ 
tion, which does not satisfy the nonlinear equality constraints. An objective value of 502.3 is obtained 
with feasibility 2.57• 10“^ for both the AL loop (Tab. 5) and the Lancelot loop (Tab. 6). The SOCP 
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Fig. 5: Norm of power flow constraints on the 56-bus network against dual iterations of a Lancelot 
outer loop with trap as primal solver. Inner solver: trap (full black), centralised trust region method 
(dashed grey with cross markers). 



Fig. 6: Cumulative sCG iterations vs iteration count in the first Lancelot subproblem formed on 
the AC-OPF with 56 buses. Results obtained with trap as inner solver (full black), with a cen¬ 
tralised trust region method (dashed grey). Results obtained with a preconditioned refinement stage 
are highlighted with cross markers. 


relaxation returns an objective value of 265.75, but physically impossible, as the power flow constraints 
are not satisfied. The nonlinear solver ipopt yields an objective value of 502.3 with feasibility 5.4-10“®. 
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Outer iter, 
count 

7 ^ inner it. 

# cum. sCG 

Inner KKT 

PF eq. constr. 

1 

122 

1382 

8.45 ■ 10“^ 

6.68 

2 

189 

4486 

6.71 • 10"^ 

1.49 ■ 10“’ 

3 

139 

11865 

9.87- 10-® 

8.79 ■ 10-4 

4 

49 

3958 

6.75 ■ 10-® 

7.92 ■ 10-® 

5 

9 

936 

5.45 ■ 10-’’ 

4.58 ■ 10-^ 


Table 3: Results for the 56-bus AC-OPF of (Gan, L. and Li, N. and Topcu, U. and Low, S.H., 2014) 
using a (local) augmented Lagrangian outer loop with trap as primal solver. 


Outer iter, 
count 

7 ^ inner it. 

7 ^ cum. sCG 

Inner KKT 

PF eq. constr. 

1 

100 

924 

9.74 ■ 10-^ 

6.42 

2 

133 

3587 

2.40 ■ 10-3 

3.60- 10-4 

3 

54 

4531 

1.03 ■ 10-4 

4.00- 10-3 

4 

10 

858 

4.20 ■ 10-® 

1.02 ■ 10-3 

5 

42 

3288 

4.37- 10-® 

2.32 ■ 10-4 

6 

13 

916 

1.82 ■ 10-5 

4.35 ■ 10-5 

7 

40 

6878 

3.70 ■ lO-’ 

8.16 ■ 10-8 

8 

6 

420 

4.64 ■ 10-8 

4.97- lO-’ 


Table 4: Results for the 56-bus AC-OPF of (Gan, L. and Li, N. and Topcu, U. and Low, S.H., 2014) 
using a Lancelot outer loop with trap as primal solver. 


Outer iter. 

count 

7 ^ inner it. 

7 ^ cum. sGG 

Inner KKT 

PF eq. constr. 

1 

275 

3267 

1.33 ■ 10-’ 

5.80 

2 

300 

7901 

1.39 ■ 10-4 

1.12 ■ 10-4 

3 

180 

18725 

2.13 ■ 10-8 

9.47- 10-5 

4 

26 

3765 

5.55 • 10-3 

6.63 ■ 10-8 


Table 5: Results for the 47-bus AC-OPF of (Gan, L. and Li, N. and Topcu, U. and Low, S.H., 2014) 
using an augmented Lagrangian outer loop with trap as primal solver. 


Outer iter, 
count 

7 ^ inner it. 

7 ^ cum. sCG 

Inner KKT 

PF eq 

constr. 

1 

180 

1147 

8.64 ■ 10-8 

5 

35 

2 

300 

7128 

2.23 

3.12 

10-4 

3 

215 

11304 

4.65 • 10-5 

2.97 

10-3 

4 

9 

423 

6.05 • 10-5 

3.28 

10-5 

5 

8 

503 

1.11 • 10-3 

7.90 

10-’ 

6 

2 

177 

4.64 ■ 10-8 

4.03 

10-3 


Table 6: Results for the 47-bus AC-OPF of (Gan, L. and Li, N. and Topcu, U. and Low, S.H., 2014) 
using a Lancelot outer loop with trap as primal solver. 
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6 Conclusions 

A novel trust region Newton method, entitled trap, which is based on distributed activity detection, 
has been described and analysed. In particular, as a result of a proximal regularisation of the trust 
region problem with respect to the Cauchy point yielded by an alternating projected gradient sweep, 
global and fast local convergence to first-order critical points has been proven under standard regu¬ 
larity assumptions. It has been argued further how the approach can be implemented in distributed 
platforms. The proposed strategy has been successfully applied to solve various nonconvex OFF prob¬ 
lems, for which distributed algorithms are currently raising interest. The performance of the novel 
activity detection mechanism compares favourably against the standard projected search. 
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